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ABSTRACT 

The Panel on Manpower Training Evaluation has 
recommended that* Social Security earnings data be more widely used in 
evaluating manpower programs, especially those that tend to serve 
prime-age males, such ar. the MDTA (Manpower Development Training Act) 
or NAB-JOBS (National Association of Businessmen-Job Opportunities in 
the Business Sector) programs. In spite of some limitations, earnings 
data provide vory accurate and inexpensive longitudinal information 
that can be efficiently analyzed and provide as adequate a source of 
comparison groups as tailor-made sample survey studies, with the 
additional potentiality for matching pairs of observations on 
selected characteristics such as prior earnings patterns. The panel 
recognizes the issue of confidentiality and expects adherence to 
standards set for public use of government data# The panel has 
stressed that evaluation is limited by the quality of information 
available on the population of manpower program participants. A 
suggested approach was to develop accurate samples rather than 
attempt to gather information on all trainees; however, this 
precludes development of accurate trainee lists for each project. A 
concluding recommendation was the undertaking of a study comparing 
the outcomes of a true experimental design to evaluate a manpower 
program with outcomes as measured by Social Security data. 
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NOTICE 



The project which is the subject of this report was approved 
by the Governing Board of the National Research Council, acting 
in behalf of the National Academy of Sciences. Such approval re- 
flects the Board's judgment that the project is of national 
importance and appropriate with respect to both the purposes and 
resources of the National Research Council. 

The members of the committee selected to undertake this 
project and prepare this report were chosen for recognized 
scholarly competence and with due consideration for the balance 
of disciplines appropriate to the project. Responsibility for 
the detailed aspects of this report rests with that committee. 

Each report issuing from a study committee of the National 
Research Council is reviewed by an independent group of qualified 
individuals according to procedures established ar.d monitored by 
the Report Review Committee of the National Academy of Sciences. 
Distribution of the report is approved, by the President of the 
Academy, upon satisfactory completion of the review process. 
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PREFACE 

The task set by the Department of Labor 4n its inquiry to 
the National Academy of Sciences involves a series of technical 
scientific assessments of the usefulness of Social Security data 
in the evaluation of manpower programs. The Department of Labor 
has been investigating the possibility of using economic and 
socio-demographic data from the earnings and tax records of the 
Social Security Administration. It is clear that Social Security 
data files are reliable and inexpensive sources of data. 

This report presents the strong points and weaknesses 
of the Social Security lata and, where appropriate, compares 
the advantages and proMems of these data with other sources in 
the area of manpower evaluation. A positive recommendation is 
given for the proj^ram of ..nalysis. 

The Panel met three times in completing its task. 
The Department of Labor wa. fully cooperative in the 
exchange of information and opinion. A special recognition is 
owed to Ernst W. Stromsdorfer (Indiana University) , who served 
as Consultant, and who drafted this report, and to Orley 
Ashenfelter, Department of Labor (now at Princeton University), 
who provided direct, effective, and continuous collaboration, 
greatly expediting the work of the Panel. 



Sherwin Rosen 
University of Rochester 
Chairman, Panel on Manpower 
Training Evaluation 



ERIC 



PAMEL ON MAWOWER TRAIMTNR Pv.,n.TTo„ 

1„ °° "^'""^ '"^l"^""" established 

« he Asse-,ly of Behavioral and Social Sciences at the request 
Of the Department of Uhor to carry out a study and analysis of 
the use Of Social Security earnings data to assess the effects 
of manr.o„er training programs. T„e specific tasks were CD to 
revxew and examine the technical adequacy of earnings data for as- 

irr"' °1 r'""'' '-^^ast improvements 

xn the methodology, and priorities for further analysis, (3, to 
comment on the appropriateness of these data for policy and pro- 
Sram decisions, and C4) to compare the relative merits of this 
technique with others. 

The members of the Panel were: Sherwin Rosen (University 
of Rochester), Ch™, Nathan Caplan (University of Michigan), 
Stanley Lebergott (Wesleyan University), Henry M. Levin (Stanford 
Unwersuy). Robert A. Levlne (RAND Corporation). Richard Light 
Harvard Unxverslty), and Finis Welch (University of Califomla- 
Angeles). Ernst W. Stromsdorfer (Indiana University) served 
as a consultant to the Panel throughout its tenure. Sherman Ross 
was the Executive Secretary for the Panel, and Ms. Barbara 
Arenson served as secretary. 



FINDINGS AND RECOMMENDATIONS 

(1) The Panel recommends that Social Security data, including the 
Continuous Work History Sample (CWHS), be more widely used in the 
evaluation of manpower programs. For some programs, the evalua- 
tions will be particularly useful and reliable. Failure to use 
these data will result in excessive evaluation costs to the 
federal government, with no corresponding gain in quality of 
evaluation. The reasons for the Panel's position are as follows: 
(a) Accuracy of Social Security earnings data is con- 
siderably higher than comparable data other retrospective 
sample surveys offer. There are no problems of recall and 
interviewer or interviewee bias. Non-response bias, the 
bane of sample surveys, is not a problem, 
(b} Social Security data can complement the results of 
carefully designed field evaluation studies at very low 
cost. 

(c) Social Security data are considerably cheaper to ac- 
quire than data derived from sample surveys; a few cents 
per observation for Social Security data compared with tens 
of dollars per observation for sample survey data. 

(d) Use of Social Security data for comparing earnings 
performance of trainees and non-trainees is more reliable 
than comparisons based on data used in most sample survey 
evaluations of manpower programs. 

(e) Appropriate econometric methods exist to analyze the 
Social Security data efficiently. 

(2) The Panel recognizes that the issue of confidentiality of the 
data contained in Social Security records is a difficult one. It 



vi 

is inappropriate for the federal government to release information 
v*iich has been entrusted to it in goor faith as confidential to 
private citizens or to other agencies in government. Therefore, 
given that these data are to be more widely used, as the Panel 
recommends they be, strict controls and sanctions on public use 
must be applied to prevent illegal use by private individuals 
and government agencies, whether they be federal, state, or local • 
We expect that the standards set for public use of other data 
collected by the federal government, such as the census of popu- 
lation, will be adhered to in manpower evaluation studies • We 
see no reason why such standards cannot be met, nor have we found 
any evidence that the standards have been violated. 
(3) The Panel stresses that opportunities for evaluation are 
seriously constrained by the quality of information available on 
the population of manpower program participants. Current files 
of data which identify characteristics of trainee populations 
(MA- 101, MA- 102 and Manpower Automated Reporting System (MARS) 
files) apparently contain serious non-reporting biases. The 
exact nature of these biases is not known with any certainty. 
Therefore, the Panel recommends initiation by the Department of 
Labor of an evaluation of the MARS file, addressing the following 
questions: 

(a) Why is there error in reporting? 

(b) What is the source of this error? 

(i) Is there a systematic failure of certain projects 
to report correctly, or 

(ii) Is the non-reporting random? 

(c) What can be done about 

(i) The non-reporting of data, and 
(ii) The resulting bias, if any? 



005) 



vii 

(4) The Panel reconunends that a study be undertaken to determine 
the validity of Social Security data for manpower program evalua- 
tion* Such a study would co^rpare the outcomes of a true experi- 
mental design to evaluate a manpower program with outcomes as 
measured by Social Security data* The scope and target popula- 
tions of the study are topics left to be developed by the 
U. S. Department of Labor* 
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I. The Problepi Setting 

Manpower training programs have been in existence a 
little over a decade, yet, with the possible exception of the 
Manpower Development and Training programs, little is known 
about the educational or economic effects of manpower training 
programs. This is troublesome, especially in light of the fact 
that about $180 million have been spent over the past ten years 
in an attempt to evaluate these programs.^ 

There are several reasons for lack of clarity in the 
definition of program effects: First, inadequate research 
methods are often used, even when adequate methods are available. 
For instance, a study may fail to use a proper control or com- 
parison group, or may use no control group at all. Second, 
almost all evaluations are case studies rather than studies 
based on national samples, so that considerable restraint must 
be exercised in generalizing results to issues of national 
policy. Third, many studies use non-random judgment samples, 
rather than probability samples. As a result, we have no idea 
of the representativeness of the study sample compared with the 
population from which the samples were drawn. Fourth, most of 
the studies are retrospective. Considerable time passes between 
the end of a program and its evaluation. Many sample respon- 
dents disappear, resulting in serious non-response bias. For tho 
respondents who are located, recall error further biases the 



Jon H. Goldstein, ''The Effectiveness of Manpower Programs: A Re 
view of Research on the Impact on the Poor,»» Studies in Public 
Welfare , Paper No. 3, Subcommittee on Fiscal Policy, Joint 
Economic Committee, Congress of the United States, Washington: 
U.S. G. P.O. 1972, p. 14. 
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results of the study. Next, studies often fail to collect appio- 
priate socio-demographic information. Some of these variables 
influence program results but are not affected by those programs, 
and the analysis of program impacts is contaminated by extraneous 
factors. Finally, the existing analyses cover a spectrum of data, 
methods, projects, time periods and locales, and it is next to 
impossible to compare the results of studies of the same program. 

In short, while reliable evaluation is badly needed, it 
does not exist even after ten years of study and the application 
of large amounts of public resources. 

The need for a less expensive, more reliable evaluation 
strategy is imperative; for important decisions on social pro- 
grams continue to be made in the absence of reliable objective 
information. It is with considerable sense of urgency, then, 
that the United States Department of Labor has been investigating 
the possibility of using economic and socio-demographic data from 
the earnings and tax records of the Social Security Administration 
(SSA) . Indeed, the Department and other organizations have done 
considerable experimentation with these data. Some studies have 
used the data in actual evaluations, while others have been de- 
signed to test the feasibility of employing the data in evalua- 
2 

tions. Methodological studies unanimously conclude that Social 
Security data are extremely inexpensive as well as highly re- 
liable data sources. However, while there are several positive 



'For studies that use the data to evaluate selected manpower 
programs, see Michael E. Borus, »»Time Trends in the Benefits from 
Retraining in Connecticut, Industrial Relations Research Asso- 
ciation 



012 



3 



advantages to using Social Security data, there are also some 
disadvantages. 



Proceedings, Washington, D. C, December 28-29, 1967; Edward C. 
Prescott and Thomas F. Cooley, Evaluating the Impact of MDTA 
Programs Under Varying Labor Market Conditions , Final Report, 
MEL 73-08, U. S. Department of Labor Contract No, 83-42-71-04, 
Philadelphia, Pennsylvania: University of Pennsylvania, October 17 
1972; and James L. Stem, "Consequences of Plant Closure," The 
Journal of Human Resources, Winter, 1972. 

Studies which attempt to assess the feasibility of using 
Social Security data as a tool in manpower program evaluation are 
the following: J. B. Berterman, Re view of the Manpower Training 
Follow-up Data Analysis Syst em, Final Report (Draft), U. S. Depart- 
ment of Labor Contract No. 43-1-003-51, The Planning Research Cor- 
poration, McLean, Virginia, March 17, 1973; William D. CcMranins, 
Social Security Data : An Aid to Manpower Program Evaluation , 
PRCR-1543, The Planning Research Corporation, McLean, Virginia, 
November 1970. David J. Farber, "Using Social Security Records 
to Measure Change in Trainee Earning Capacity," U; S. Department 
of Labor, Manpower Administration, OMMDS, Unpublished Draft Paper, 
November 25, 1970; David J. Farber, "Changes in the Duration of 
the Post-Training Period in Relative Earning Credits of Trainees: 
Class of 1964— A Graphic Synopsis," U. S. Department of Labor, 
Manpower Administration, OMMDS, Administratively Restricted Un- 
published Paper, August 27, 1971; David J. Farber, "A Reply to 
the Miller Critique of the M.A. (Manpower Administration) Method 
of Evaluating the Gains in Earnings of MDTA Trainees," Unpublished 
Paper, Dated November 1972; Louis S. Jacobson, "The Use of Social 
Security Data in the Evaluation of Manpower Programs," The 
Public Research Institute, Center for Naval Analyses, Arlington, 
Virginia, Unpublished Draft Report, February 14, 1973: Louis S. 
Jacobson, "y^ Assessment of the Longitudinal Models of Income 
Determination Used to Estimate the Impact of MDTA Trainings on 
Earnings," The Public Research Institute, Center for Naval 
Analyses, Arlington, Virginia, Unpublished Draft Report, May 1, 
1973. Revised June 13, 1973; and, Louis S. Jacobson, "The Use 
of Longitudinal Data to Assess the In5)act of Manpower Training 
on Earnings," PRI 73-2, The Public Research Institute, Center 
for Naval Analyses, Arlington, Virginia, Final Report, 20 July 1973 
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This report sets forth both the benefits and shortcomings 
of using Social Security data for program evaluation and, where 
appropriate, compares these data with the feasible alternatives. 
The report is divided into three parts. The first part deals with 
the structure of Social Security data for evaluation of manpower 
training programs. The second compares the benefits and short- 
comings of using Social Security data in evaluations with alterna- 
tive data sets. The third section discusses selected methodologi- 
cal issues. And a concluding summary ends the report. 

II. Current Structure of Social Security Data Available 
for Evaluation of Manpower Training Programs 

The procedure for assembling Social Security earnings 
data involves use of social security numbers taken from trainee 
records stored in the Manpower Automated Reporting System (MARS) 
file and matching them with earnings records on file at the Social 
Security Administration (SSA) . The Social Security Administration 
provides the information shown in Table 1. The matched data are 
then returned to the Manpower Administration where they are 
merged with the information on trainee characteristics shown in 
Table 2. For purposes of comparison, the data from matched 
trainee records are compared with the data from a random sample 
of non-trainees taken from the Continuous Work History Sample 
(CWHS), a 1 percent sample of the basic SSA master file. 
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TABLE 1 

VARIABLES INCLUDED ON THE CNHS DATA FILE AND 
ACCESSIBLE TO MARS MATCHING FILES 



Birth year 
Birth month 
Race 
Sex 

Quarters employed 1937-1950 
Quarters employed 1951-1972 
Total earnings 1937-1950 
Total earnings 1951-1972 
Dead or Alive Code 

Total self employed quarters 1937-1950 

Total agricultural quarters anployed 1937-1950 

Total self employed quarters 1951-1972 

Total agricultural quarters employed 1951-1972 

Earnings 1951 to present (1972) by year 

Quarters employed 1951 to present (1972) by year 
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TABLE 2 

SELECTED VARIABLES INCLUDED ON THE MARS FILE 



Class 

Social Security Administration Ninnber 

Last Name 

Initials 

Birth Date 

Program Code 

Contract Identification 

State Code 

Fiscal Year of approval 
Flag for Estimated Termination 
date 

Termination Status 
Start date 



Termination date (actual or 
estimated) 

Sex 
Race 

Ethnic Origin 

Language spoken 

Veteran Vietnam era 

Marital Status 

Number of Dependents 

Highest school grade com- 
pleted 

Public Assistance 

Dictionary of Occupational 
Titles, Primary Occupation 

Length of stay in program 

Test score (Job Corps only) 

Dictionary of Occupational 
Titles of training (three 
higher order digits) 
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As can be seen, a considerable amount of information can 
be generated on program participants • While these variables do 
not exhaust the list of eligible variables used in program evalu- 
ations, they do include information on age, sex, race, and 
education, i^ich is most easily rationalized in all theoretical 
models used to estimate a program's effectiveness • Furthermore, 
in contrast to most other data sources, participants' earnings 
histories can be followed accurately for extended periods of 
time. However, these data, while extremely useful, are not 
perfect. 

It must be noted that much of the information from trainee 
records shown in Table 2 does not exist for any comparison group 
one might wish to generate from the Continuous Work History 
Sample. While the Manpower Automatic Reporting System file pro- 
vides information on such important variables as age, sex, race, 
marital status, education, veteran status and primary occupation 
of trainees, the CWHS file provides data only on age, sex, and 
race of the comparison group. 

It is also true that the reporting of variables by type 
and number is not uniform across manpower programs. Nevertheless, 
SSA data include a highly accurate longitudinal earnings history. 
Indeed, this is their unique and most interesting feature. It 
can be argued that such factors as education, family background, 
motivation, and achievement fundamentally determine a person's 
expected lifetime earnings. Therefore, prior earnings histories 
must be a reflection of these very same variables. Use of an 
extensive earnings history prior to program involvement controls 
for labor market influences of socio-demographic variables, and 
the absence of certain specific variables, such as education, is 



017 



8 

not damar/'ng to the effective application of tho SSA data. Prior 
patterns of earnings serve as very powerful controls, even though 
they may not compensate for all missing variables. On the whole, 
however, the critical variables for analysis do exist: age, sex, 
race, prior- and post-program earnings, history, and information 
relating to program structure and experience. 

It is possible, though relatively expensive, to use the 
CWHS of employers to add to the list of potential variables by 
linking it up with the CWHS of individuals. The Longitudinal 
Employee-Employer Data (LEED) sample achieves this with a 1 
percent sample of SSA data from en^jloyer and employee records.^ 
The LEED data can be used to assess industrial and geographical 
mobility of workers. Information can be obtained on workers' 
industry attachments at the four-digit Standard Industrial Classi- 
fication level, as well as on their location. Such data would be 
desirable to determine if regional or industry-specific influences 
affect the pattern of benefits from manpower training. 

However, the location variable is faulty in that it may 
report either an establishment location or the location of the 
firm's home office with no indication of which is involved. Thus, 
the location reported may not coincide with the location of the 
worker whose earnings are being reported. In addition, in order 
to trace most trainees' industrial employment patterns, it is 
necessary to scan every employer in the SSA file of employers. 



Longitudinal Employer-Employee Data (LEED), Social Security Ad- 
ministration, Office of Research and Statistics, Division of 
Statistics, Statistical Operations Branch, Auril 1970. 
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One crude estimate of the cost of generating mobility data is 
about $2,000. per observation, if performed on a quarterly basis. 
While location and industrial mobility data are useful, they are 
not critical to evaluation, since prior- and post-training earnings 
patterns to a large extent reflect the effect of industry and 
region. Therefore, the potential absence of industrial and geo- 

A, 

graphic variables from analysis is not sufficient to reject the 
SSA data. 

III. Comparisons of Social Security Data with 
Alternative Data Sets 

Any recommendation with respect to the use of the SSA 
data depends on what one gains or loses in comparison with other 
data bases. Before discussing the salient advantages and dis- 
advantages of the SSA data, a brief summary of the main positive 
and negative aspects of these data is in order. 

The advantages gained from using these data are: 

(1) The cost per unit of observation is extremely small. 

(2) The data are of very high accuracy. 

(3) The data are longitudinal. 

(4) There is no non- response bias due to missing observa- 
tions or variables. 

(5) The data embody a comparison group as good -as any that 
have been used in existing evaluations. 

(6) The sample sizes are very large. 

The disadvantages of using these data are: 

(1) Earnings rather than hours of work and wage rates are 
reported. 

(2) No detailed information is available on labor force 
participation. 

(3) Reported earnings are truncated for those earning 
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above the Social Security maximum. 

(4) Social Security coverage varies as a function of age. 

(5) A limited number of socio-demographic variables arc 
available. 

(6) There is some time lag in the full reporting of the 
data* 

(7) Problems exist with maintaining confidentiality of the 
data. 

A. Advantages of SSA Data 

Cost . With respect to cost, the SSA data are overwhelmingly 
superior to other sources. One need only contrast the cost of a 
few cents (less than $.10) per observation with the cost of over 
$600 per observation for the data now being collected for the 
Office of Economic Opportunity— U. S. Department of Labor Longi- 
tudinal Evaluation Study of Four Manpower Training Programs .^ 
A far less costly study of the In-School and Summer Neighborhood 
Youth Corps still cost approximately $35 per observation.^ 
The difference in cost between these two studies results from 
only one personal field interview for the NYC evaluation in 



Longitudinal Evaluation Study of Four Manpower Training Programs , 
Prepared under Contract No. B99-4783, U. S. Office of Economic 
Opportunity, Division of Evaluation, Washington, D. C; 1969 
and other dates. 

^Gerald G. Somers and Ernst W. Stromsdorfer , Cost-Effectiveness 
Study of the In-School and Summer Neighborhood Youth Corps , Madi- 
son, Wisconsin: Industrial Relations Research Institute, Center 
for Studies in Vocational and Technical Education, The University 
of Wisconsin, 1970. 
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contrast to four for the Longitudingtl Evaluation ♦ Also, lower 
expense was incurred with the NYC study because efforts to locate 
nonrespondents were less vigorous. Both these and similar studies 
characteristically collect many variables per observation and 
much elaborate detail on each variable ♦ However, it has generally 
been the experience of the Panel members that only a small pro- 
portion of these variables is ever used. Therefore, the value 
of the enriched data of the field survey is often more apparent 
than real. Indeed, the collection of so many and diverse variables 
often reflects poor planning and the absence of an appropriate 
evaluative model. But, whatever the reason, it is clear that the 
marginal value of many of these data is very low. Such studies 
repeatedly fall back on a few variables whose theoretical 
effects in a model of income determination are predictable — age, 
sex, race, education, and marital status, to name the most 
obvious. And, of course, the SSA data contain information on age, 
sex, and race. 

Errors in the Data . One distinct advantage of the SSA 
data lies in the fact that they contain neither interviewer bias 
nor interviewer error. TTie earnings reported are accurate, ex- 
cept insofar as employers may find it in their interest to 
taidel-report to avoid the tax or the costs of the paperwork. 
Respondents clearly cannot interject error into the data through 
failure to recall accurately, nor are they in a position to dis- 
semble the true nature of their earnings. Finally, there is no 
interviewer to inject non-random error into the reported data. 
In contrast, sample survey data rely on retrospective recall and 
are subject to interviewer-interviewee interaction, which creates 
serious problems. No statistical technique can overcome these 
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errors, 

Longitudirigll Eai^irigs , The strongest point in favor of 
the use of the SSA data lies in the longitudinal earnings history 
thus made available for analysis. The strengths of this informa- 
tion have been discussed above. However, we should again point 
out that SSA data allow one to accurately trace a worker's 
earnings history for the entire period in which he is working in 
covered employment. No survey data can do this. In addition, 
over 90 percent of the workers in the United States are now 
covered by Social Security, 

Non-Response Bias , The SSA data are notable for their 
lack of non-response bias. This judgment must be tempered by 
an awareness of lack of coverage for certain occupations, as al- 
ready discussed. By contrast, sample surveys that rely on mail 
questionnaires are fortunate to have a response rate as high as 
30 percent. Personal field interviews can often pick up over 
80 percent of an original sample, but the marginal cost of the 
hard to locate observations often exceeds $100 or more. 

The Problem of the Control Group , With respect to the 
selection of a control group, past evaluations have been in no 
way superior to the comparison groups available to analyses 
based on the SSA data. The Panel is aware of only one evaluation 
study, a case study of black girls in an NYC program in Cincinnati, 
Ohio, which used a true experimental design with random assign- 
ment of a study sample to an experimental and control group. This 
study also suffered from non-response bias through attrition of 
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both controls and experimentals.^ 

Sample suirvey studies usually employ program dropouts or 
no- shows as comparison groups. Some investigators attempt to 
generate a comparison sample that is legally eligible to enroll 
in the program in question. But, none of these efforts overcomes 
the problem of self-selection bias; they merely redefine it. 
Thus, Herman Miller has criticized the Farber studies for using 
SSA-CWHS observations instead of program dropouts as a comparison 
group. Neither has an overwhelming theoretical appeal over 
the other. To be sure, dropouts self-select themselves into the 
program. In this regard, dropouts are similar to the trainees 
who complete a program. However, dropouts also self- select 
themselves out of the program — some because they perceive better 
opportunities elsewhere; others because they represent program 
failures. In the absence of variables that define the reasons 
for dropping out, the injection of bias due to self-selection 
out of the problem is some unknowable mixture of the two effects. 
The SSA-CWHS observations, on the other hand, can be matched as 
closely with program completers as can dropouts. Additionally, 
use of SSA-CWHS data can help settle the issue of choice between 
dropout comparison groups and CWHS comparison groups by analyzing 
pre-training earnings patterns of completers, dropouts, or CWHS 
observations for any given age, sex, or race group. 



Gerald D. Robin, An Assessment of the In-Public School Neighbor- 
hood Youth Corps Projects in Cincinnati and Detroit, with Special 
Reference to Sumaer-Only and Yeat-RoUnd Enrollees, Philadelphia, 
PA: 'national Analysts, Inc., February, 1969. 
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Many evaluations use a before-after framework of comparison 
to override problems inherent in self-selection. However, before- 
and-after techniques inject a different type of error into program 
evaluations: Often individuals enter manpower programs because 
their earnings are temporarily low. MDTA records show that almost 
all trainees are unemployed when they join the program, making it 
difficult, if not impossible, to project what they would have 
earned m the absence of training. However, their expected life- 
time employment and earnings are higher than actual earnings at 
the time they enter the program. Over time their earnings will 
regress toward the mean. Thus, use of a before-after comparison 
without a control group must certainly overestimate program 
effects.^ 

The existence of temporarily low earnings due to poor 

labor market prospects prior to entering a program is a major 

problem that must be overcome statistically to arrive at an 

q 

accurate estimate of program effect. Trainees with temporarily 
low earnings prior to training may come from a different popu- 
lation of persons than the CWHS comparison group. The Department 



Jacobson, 0£. cit., February 14, 1973, p. 2. 

Wrdin and Borus experimented with their Michigan retraining data 
and found the gains from retraining were $1,524 using a before- 
after method; when using a control group, the gains were only 
$216 in the year following training— a difference by a factor of 
seven. See Einar Hardin and Michael E. Borus, Economic Benefits 
and Costs of Retraining Courses in Michigan, East Lansing, Michi- 
gan: Michigan State University, December, 1969. 

I 

Jacobson, 0£. cit ., February 14, 1973, p. 9. 
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of Labor evaluation staff in the Assistant Secretary's Office of 
Policy, Evaluation and Research CASPER) has demonstrated the 
critical inrpact of this issue. For white male MDTA institutional 
completers an explicit accounting for "trouble in the labor 
market" converted a negative program effect of approximately 
$200 per year to a positive effect of over $400 per year, both 
over the same five year period. 

In summary, past sample survey evaluations have no clear 
advantage over the use of SSA-CKHS data with respect to choice 
of comparison groups. The problem of self-selection bias is 
present in both data sets and at this point no one can say where 
it is more severe. This difficulty will persist until the 
federal government undertakes carefully planned experiments in 
manpower training evaluation. The before-after design is no 
solution to the self-selection problem due to the problem of 
transitory low earnings and employment prior to training. 

Thus, the SSA-CWHS data clearly warrant use and experimen- 
tation. Two methodological alternatives have been suggested to 
deal with the comparison group problem. One is to use people 
who enter a program in, say, 1969, as a control for those persons 
who enter the same program in 1967. Given appropriate infla- 
tionary adjustments, this might be legitimate if the objectives 
and target population of a given program do not change over the 
years in question. The second suggestion involves the use of 
successive age cohorts of trainees as comparisons for immediately 
prior age cohorts. Thus, for instance, vAen analyzing the effect 
of a program on persons 23 years of age one year after they leave 
a program, researchers can utilize the labor market experience of 
24 year olds who are currently in the program as a possible 
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comparison group. Though neither alternative overcomes the transi- 
torily low pre-training earnings problem, these and other tech- 
niques deserve consideration. 

Finally, it must be noted that cross-program comparisons 
of particular socio-demographic groups are just as difficult as 
comparisons between trainees of a specific program and a selected 
comparison group. This is so because self-selection also exists 
as a function of program type. Such self-selection is not well 
understood, but this, of course, is a general problem, 

B, Disadvantages of SSA Data 

Earnings , Our discussion of the weakness of SSA earnings 
data encompasses the first three points listed on page 9 above. 

One difficulty with the SSA earnings measure is immediate. 
An individual's earnings may exceed the maximum taxable income. The 
seriousness of this problem depends on the proportions of program 
participants and comparison groups from SSA data which exceed the 
maximum. The proportions undoubtedly vary among programs and 
socio-demographic groups. Table 3 presents an example for males 
and females reporting maximum earnings in some MDTA institutional 
and On- the- Job Training (OJT) manpower projects. 



TABLE 3 

PERCENT OF SAMPLE WITH MAXIMUM 
SOCIAL SECURITY EARNINGS CREDITS 



1966 



1969 



1970 



Earnings Maximum 

Males 

Females 



$6600 
2 
0 



$7800 
6 

0,5 



$7800 
8 

0,5 



Source: Prescott and Cooley, o£, cit , , Table 7, p, 13, 
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The loss of 2 percent of the male sample shown in Table 3 
may not be critical to accurate evaluation • However^ preliminary- 
analysis by economists at ASPER on the impact of MDTA institutional 
and OJT training on white males resulted in a 33 percent loss of 
trainee observations when trainees whose earnings exceeded the 
maximum were excluded frcMn the analysis. Clearly, unknown biases 
may creep into the analysis when the loss is this large. The 
reasons for discrepancy between the ASPER and Prescott and Cooley 
results remain to be determined. However, pavt of the difference 
must be due to the fact that the ASPER Study was limited to white 
males only, while the Prescott and Cooley Study covered all males. 

Some investigators have attempted to bypass the problem 
by extrapolating earnings of those who reach the maximum. But 
all these methods are essentially arbitrary. If one cannot be 
reconciled to the use of arbitrary methods, persons with greater 
than maximum taxable earnings can be eliminated from the analysis. 
This creates a new problem by confining the analysis to the net 
remaining group: The data are not representative of the program 
population as a whole, though they can provide information for 
those with relatively low earnings, a group with which policymakers 
are often most concerned. 

Another major problem with the earnings measure is that 
some workers report zero earnings in a given quarter. There are 



The maximxim was $4,300 until 1964. This problem is less impor- 
tant today due to the high limit of $10,800. The limit is 
scheduled to rise still further. 

^^See Stem, 0£. cit . , p. 10 and Borus. 0£. cit. , p. 37. 
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four possible reasons for observing zero earnings: 

Cl) The worker is not in the civilian labor force. 

C2) The worker is in the civilian labor force, but 
unemployed. . 

(3) The worker failed to earn the minimum of $50 from any 
given employer per quarter. 

(4) The worker was employed in an uncovered occupation such 
as farm work, most federal government occupations, or 
some state and local government occupations. 

As indicated above, about 90 percent of the workers in the 

United States are now covered by Social Security. Very few MDTA 

trainees are agricultural workers or federal employees and coverage 

12 

is even higher for them. Therefore, most persons who report no 

earnings in a quarter are either unemployed during the quarter, 

out of the labor force, or some combination of the two. 

The lack of precise knowledge as to the reason for zero 

earnings can create bias in the estimate of the impact of a man- 

13 

power program on earnings. It makes a difference to the analysis 
whether a person has no market earnings because of voluntary withdrawal 
from the labor force or because of unemployment. The fact is that 
a large proportion of persons report zero earnings, though data 
from one study shown in Table 4 indicate the problem is much 
more important for females than for males. Yet the problem is 
far from trivial for males. When the ASPER group excluded zero 
earners from a sample of white males who had been in MDTA 



Prescott and Cooley, 0£. cit ., p. 13. 

^Louis Jacobson at the Center for Naval Analysis is currently 
investigating this problem. 
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TABLE 4 

PERCENT OF SAMPLES WITH NO REPORTER SOCIAL SECURm 
EARNINGS CREDITS BY YEAR . . 



1966 1969 1970 

Males 17% 9% 16% 

Females 39% 20% 28% 



Source: Prescott and Cooley, c£. clt., Table 8, p. 14. 

institutional or OJT problems, the total number of trainee obser- 
vations fell by more than SO percent. Clearly, it is most impor- 
tant to gain information cn the reasons why trainees had zero 
earnings in any given quarter. 

It will be useful to obtain information on those persons 
in each manpower program who exceed the maximum earnings in any 
given year, those who have zero earnings in any given quarter, 
and the number of zero earnings quarters per year all classified 
by relevant socio-demographic characteristics. It should then be 
possible to determine which programs are most seriously affected 
by this limitation of the data. The U. S. Department of Labor 
should support a study to analyze the characteristics of 
persons who exceed the maximum, as welJ as of those who report zero 
earnings, in order to determine the structural and behavioral 
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reasons for these types of behavior. 

A final problem with the earnings data is due to the fact 
that Social Security sources do not separately report wage rates 
and hours worked. Observed earnings are not unambiguous measures 
of an increase in economic welfare, since earnings reflect varia- 
tions in both wage rates and hours worked. Changes in hours 
worked imply corresponding changes in the benefit measures solely 
due to changes in non-market time available to a trainee. On 
the other hand, wage rates index real productivity. If training 



Consider the case in which the reason for zero earnings differs 
between the post-training and the pre- training period. An 
example would be the movement of a person from uncovered employ- 
ment before training to covered employment after training. . Zero 
earnings are reported before training, with a resulting , upward , 
bias in the estimation of the before-after training effects. A 
similar problan exists if a person moves from non-labor force 
participation prior to training to covered employment after 
training. Some or all of the earnings increase is due to the 
simple act of entering the labor force. 

Two sets of data can be used to check th.e nature 
of these particular biases in the SSA data. First is the 
data set of the Longitudinal Evaluation Study of Four Manpower 
Training Programs , cited above. The other is the data set of 
the National Longitudinal Surveys developed by the U. S. Depart- 
ment of Commerce, Bureau of the Census and the Center for Human 
Resource Research at Chio State University in Columbus, Ohio.,. 
Each of these data sets contain the Social Security number of the 
sample respondents. The National Longitudinal Survey data are . 
already being utilized to"^ evaluate training programs. See Gerald 
G. Somers, »»An Evaluation of the Effects of Manpower Programs in 
the United States Based on the National Longitudinal Surveys, 
Madison, Wisconsin: Department of Economics, University of 
Wisconsin, in progress. 
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increases productivity, the worker's wage rate should rise. An 
increase in the wage rate results in a corresponding potential 
increase in income, with no loss of welfare due to reduced leisure 
or non-market work. Of course, the problem is more serious for 
women, since the uses of a woman's time in the home have a higji 
value compared with other uses.^^ In short, due to the impossi- 
bility of decomposing earnings into wage rates and hours worked, 
as well as the impossibility of distinguishing the causes of low 
earnings— unemployment, involuntary short work week, or voluntary 
withdrawal from the labor force— the SSA data are limited to 
measuring a single specific outcome for programs that essentially 
have multiple outputs. This outcome can also be subject to con- 
siderable bias in measurement. 

Coverage as a Function of Age . Related to the prob- 
lem of zero-earnings experience is the fact that Social 
Security coverage is partially a function of age, as young workers 
first enter the labor force. Thus, Job Corps and Out-of-School 
Neighborhood Youth Corps (NYC) serve mainly young persons under 
21. Yet, the labor force participation rate of males aged 16-19 
is only 58.1 percent, while it is 83.9 percent for males aged 
20-24 and increases to a high of 96.4 percent for males aged 



These ideas appear in a memorandum from Dr. Orley Ashenfelter, 
Director, Office of Evaluation, Department of Labor to Mr. Michael 
Moskow, Assistant Secretary for Policy, Evaluation, and Research, 
Department of Labor, dated May 31, 1972. 



ERIC 



031 



22 

35-44 Thus, if we desire to evaluate the Job Corps program 
with SSA data, it is likely that, without appropriate adjustment, 
we would over-estimate program benefits. There would be signi- 
ficant numbers of pre-program quarters of zero earnings due to 
non-labor force participation, while simple entrance to the labor 
force rather than training per se would increase the likelihood 
of positive measured earnings. Adjusting the data by dropping 
observations on persons with zero-earnings quarters would sub- 
stantially preclude an analysis of the impact of the Job Corps 
or NYC programs. Thus, the data are not well suited to evaluate 
the training benefits to teenagers or young workers. In contrast, 
for males aged 35-44 in MDTA or NAB-JOBS, the problem is minimal. 

Limitations in Available Socio- Demographic Variables , For 
the foreseeable future, use of the SSA will be constrained by the 
absence of such important socio-demographic variables as education 
or socio-demographic status. The available variables have been 
described above. Efforts are now under way within the Social 
Security Administration to link Social Security data with the 
Current Population Survey (CPS), One other major linkage project 
is under way involving Internal Revenue Service (IRS) data, March 
1970 CPS data, and 1970 Decennial Census data to the sum^nary 
Social Security earnings files. Linkage of the SSA summary 
earnings file for 1951-63 with the March 1964 CPS has already been 



Changes in the Employment Situation in 1972 , Special Labor Force 
Report 152, Bureau of Labor Statistics, U,S, Department of Labor, 
1973, Table A-28, p, A-27, 
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completed, covering over 7,000 families and 20,000 individuals. 
An additional linkage of 1973 Social Security Administration data 
is in the planning stages. If these several linked data sets can 
be tied in with the SSA data,most of the major analytical variables 
one would need in the evaluation of manpower programs will be 
accessible. However, the;5e several linkages will probably be of 
little assistance in manpower evaluations because of the extremely 
low probability that sufficient numbers of persons receiving man- 
power training will appear in the separate samples. 

It is apparent from the foregoing analysis that the SSA data 
are more useful for evaluating programs that serve mainly 
prime-age males. TTie data are not informative on the nature 
of labor- force behavior during quarters of zero earnings, 
and, therefore, are least useful for persons entering the WIN 
program, many of whom have been on welfare for extended periods. 
Nor is it reliable for those who shift back and forth between labor- 
force and non- labor -force status in response to employment oppor- 
tunities. Likewise, these data will be similarly ill suited for 
evaluation when the children of welfare families who are forced 
to register in the WIN program are involved, since these children 
may have little or no labor-force attachment prior to entering 
the program. As we have mentioned above, the same is tme of 
persons who enter the Job Corps or the Out-of-School NYC program. 
However, with an average enrollment age of 30 or more and a con- 
centration on males, both the MDTA and the JOBS program are amen- 
able to analysis with SSA data. 

Relative Time Lag . Time lags exist in the collection of 
data from both Social Security files and from field surveys. 
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However, within 12 months, 95 percent of all the SuiMary Earnings 
Records CSER) are accessible for analysis. There is somewhat 
greater time lag for self-employment data, but this is not a major 
problem so far as manpower evaluation is concerned. There is a some- 
what greater time lag in generating the CWHS file than in generating 
the SER data. For example, the con5)lete Employer-Employee file 

(LEED) for 1971 is now available, so the lag here is about two 
17 

years . 

In an evaluation based on a sample survey, it can easily 
take two years from the inception of a study to the time when 
actual analysis o£ the data begins. The exact time lag depends 
on the care with which the sample design is developed, on how ex- 
tensive the effort is to locate non- respondents, and on the 
problems of data reduction. Evaluation studies often drag 
out several years. 

Problems of Confidentiality . The Panel recognizes that 
the issue of confidentiality of the data contained in Social 
Security records is a difficult one. It seems singularly inappro- 
priate for the federal government to release information to pri- 
vate citizens that has been entrusted to it in good faith as 
confidential. Yet, the power to abuse confidential data resides 
within the government also. Therefore, since the Panel recommends 
that these data be more widely used, the Panel also calls for 
strict controls and sanctions to be applied in order to prevent 



Based on information supplied by Mr. Warren Buckler of the SSA. 
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their illegal use by either private individuals or government 
agencies, whether federal, state, or local. 

However, the Panel does recommend that the data be made 
available to private individuals for two reasons. First, it 
would be unwise and inappropriate for government agencies alone 
to be in a position to conduct evaluations of their own programs. 
Second, the Panel is not prepared to judge whether the greatest 
danger of violation of confidentiality exists when the data are used 
by private individuals or \^en they are used by the government. Thus, 
since agencies other than the Social Security Administration have access 
to individual records that can be identified, it is essential to 
devise a system whereby individual records can be released to pri- 
vate researchers and their confidentiality maintained. This is not 
an insuperable problem* One way to overcome the problem, 
though it has its drawbacks, is to allow researchers access to 
variance-covariance matrices or zero-order correlation matrices. 
Other possibilities exist. 

IV. Methodological Techniques to Improve the 
Analytical Qualities of the Data 

In attempting to evaluate the usefulness of the SSA files, 

studies by David J. Farber, unpublished studies by ASPER economists, 

and the work of Louis Jacobson were reviewed, as well as the cri- 

18 

tiques of these efforts by Herman Miller and others. 



David J. Farber, in particular, deserves recognition for his con- 
tinuing efforts to in5)rove the analytical usefulness of the SSA 
data. Indeed, it is largely due to his pioneering efforts that 
the issue was brought into public debate. 
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The Panel holds no brief for any particular 
statistical method. Farber's various papers employ cross-tabula- 
tions; Prescott and Cooley, Borus, Stem, Jacobson, and the ASPER 
group use regression analysis. All are variations of the same 
basic technique, and none has a prior methodological claim. 

Farber's cross-classification results are displayed in an 
800 cell matrix. Patterns of average effect are not impossible 
to detect with this much detail, but the effort can become mind- 
boggling if only one or two more dichotomous variables are added. 
Multivariate analysis can simplify the data analysis, but at the 
cost of imposing more restrictions on the analytical model. 

Farber's model for each of his 800 cells (age (10), sex (2), 
color (2), earnings rate (5)* and earnings pattern (4)) is 
^^Tprior ^P^^t) - C^j^ppj^Qj, " ^^post) where Y equals average 
earnings, T equals trainee, N equals control, prior equals earnings 
prior to training, and post equals earnings after training. This 
particular model incorporates the important factor of differences 
between prior- and post-training earnings patterns of both the 
trainee and control group, though it adjusts for only 20 eamings- 
pattex'n distinctions. It is a certainty that there are many 
more earnings- rate/eamings-pattem distinctions. However, re- 
gression or other multivariate framework implies strong restric- 
tions on the structural differences in earnings patterns between 
trainees and comparison groups. 

Comparison of Research Methods . It is important to note 
that data do not speak for themselves. The model used to analyze 
data critically influences the estimation of program impact. Any 
model represents an alternative treatment of the data to control 
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for the influence of factors that my obscure the true nature 
of a program effect. Choices and compromises must be made among 
models, due to the set of constraints each of them imposes on the 
analysis. Table 5 presents different estimates of the impact of 
MDTA training using the same SSA data* 

Most of the experimentation with the SSA data focuses on 
the 1964 class of MDTA trainees* Since Farber's work represents 
one of the most extensive efforts to employ these dat^, we should 
look at his results first. As indicated above, Farber uses a 
cross-classification scheme. In addition, he excludes 1963 
earnings from his pre -training earnings control period, and all 
persons in the CWHS comparison group who had zero earnings in 1964 . 
The first adjustment has the effect of eliminating from considera- 
tion in the analysis of a period of transitorily low earnings and 
high unemployment for the trainees, almost all of whom were un- 
employed shortly prior to taking training. This first adjustment 
results in comparing an unemployed trainee against an "average" 
CWHS worker. The second adjustment clearly biases the estimate 
of training effect downward. In any case, his results show 
negative average earnings for white males over the five year post- 
training period, but positive earnings for white and black females. 
Black males have zero net benefits. 

The estimate by Miller uses MDTA institutional noncom- 
pleters as controls in an effort to avoid comparing trainees 
against "average CWHS workers." However, one reason for the 
dramatically different results between Miller and Farber is due 
to the weighting scheme used by Miller to standardize for dif- 
ferences among completers and noncompleters. Miller's weighting 
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scheme is different from that used by Farber and is only one of 

19 

many arbitrary weighting schemes one could use. 

Column C6) of Table 5 represents a model estimated by 
the ASPER economists that attempts to account for the individual's 
condition iimnediately prior to training. Zero and maximum earners 
have also been excluded and the sample represents persons who be- 
gan training in the first quarter of 1964 and completed training 
before the end of 1964. Thus, the sample is much different com- 
pared to the sample used by Farber. In any case, benefits to this 
group are very large— over $400 per year. 

More in line with the Farber method are the results from 
the analysis shown in Column (7) of Table 5, which uses an auto- 
regressive multivariate framework instead of Farber 's average 
quarterly earnings levels and patterns to control for the influ- 
ence of pre-training earnings. Results of this method differ 
from Farber' s and Miller's estimates. Also, the auto-regressive 
model reveals no statistically significant differences between 
using MDTA institutional noncompleters and the CWHS sample as 
comparison groups for either male or female blacks. But un- 
resolved differences still exist between noncompleters and the 
CWHS sample for male and female whites. 

The most elaborate treatment of the SSA data is by 
Jacobson. He had access to additional SSA data on employers and 
employees, and estimated positive earnings effects for all four 



See Ashenfelter, "Some Comments on....," pp. 7ff. 
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groups of MDTA institutional completers. JacobsQn, as well as 
the ASPER group, argues that persons who enter training are dif- 
ferent from persons in the CWHS in that they have a higher average 
unemployment rate immediately prior to t'.:aining. It is necessary 
to standardize for this effect in order to properly estimate the 
relation between training and earnings. Jacobson adds a proxy 
variable to control for differences in the probability of being 
unemployed in 1963. Jacobson' s estimates are the largest of the 
four alternatives. Ne should note, however, that among the 
Miller, ASPER, and Jacobson estimates, various adjustments in the 
sample and alternative methods revealed a marked insensitivity 
for the estimates of benefits to black females. The largest 
variation in estimated benefits is for white males, for which the 
difference between the highest and the lowest estimated benefit is 
$299- ($99 - (-$200)). 



'jacobson restricted his study sample in the following way. The 
sample age was from 23 to 59 years old as of 1959. Its earnings 
did not exceed the taxable limit of $4800 in any year. Only 
those samples were included whose individual records contained 
employer reports for each year covered by the study. The sample 
was finally split into a mobile and a non-mobile group based on 
whether the industry or country of the major job changed during 
the study period. 
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Clearly, these analyses show that the method and the precise 
sainple data used affect the estimated program impacts. The problem 
at this point is whether reliance is to be placed on an empirical 
or judgmental justification for choosing among models, rather than 
on a well-developed theoretical argument. (And even the choice 
of a theoretical model involves judgment.) Over this question, 
differences of opinion can arise among analysts and policymakers. 
More research is needed to resolve these issues. The Panel 
strongly recommends that the Department of Labor support the 
research necessary to achieve such a resolution. 
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V. Smnmary 

After ten years of massive expenditures on manpower- 
training programs, relatively little is known about their educa- 
tional and labor market effects . Less expensive and more con- 
ceptually uniform data must be developed to evaluate these pro- 
grams. 

Social Security data are limited in the types of socio- 
demographic information they provide, and their earnings 
measure has some shortcomings. However, the earnings data are 
very accurate, given their limitations, and very inexpensive. 
Tn addition, they present true longitudinal data, which are crucial 
to the analysis of investments in training. These data represent 
a clear alternative to sample survey data for evaluating manpower 
programs, especially for prime-age males. 

Statistical techniques exist for overcoming in large part 
the inadequacies of the quarterly earnings data reported by SSA. 

In light of the history of manpower- training studies, the 
SSA data are just as adequate a source of con^arison groups as 
tailor-made sample survey studies. In some respects the SSA 
data are even better because of the potentiality for 
matching pairs of observations on selected characteristics, 
especially with respect to prior earnings patterns. 

In short, SSA data should be employed in the evaluation 
of manpower programs — especially those that tend to serve prime- 
age males, such as the MDTA or NAB-JOBS program. They will be 
less adequate for evaluating WIN, Job Corps, and the Out-of-School 
NYC. 
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Because of the problem of zero- earnings quarters. and quar- 
ters in which earnings exceed the maxinium, the U.S. Department of 
Labor should support a study to evaluate potential biases in the 
SSA data that may result. The National Longitudinal Surveys 
data or any of the SSA-'CPS-IRS links matched to the SSA data will 
probably be sufficient to detect most of the biases inherent in 
models of earnings determination based on these data. 

Moreover, the Department of Labor should analyze and 
evaluate the quality of data in the MARS file to determine the 
degree and source of the apparent large errors in merely reporting 
the true population of program enrollees. The Panel recognizes 
that better data cost more, but a definitive judgment must be 
made concerning the MARS data. At present not even the population 
of trainees is accurately known 1 Perhaps the most efficient 
approach would be to sample the universe of program projects and 
develop accurate samples, rather than attempt to gather informa- 
tion on the universe of trainees. A pre-condition for this 
approach, or any evaluation for that matter, must be the develop- 
ment of accurate lists of trainees for each manpower project. 
Such lists do not yet exist in the MARS file; this is a serious 
inadequacy that must be corrected. 
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